Measuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences

نویسندگان

  • Yen Kaow Ng
  • Hirotaka Ono
  • Takeshi Shinohara
چکیده

We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Discovery in Biosequences Using Sort Regular Patterns

This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a...

متن کامل

Finding Minimal Multiple Generalization over Regular Patterns with Alphabet Indexing

We propose a learning algorithm that discovers a motif represented by patterns and an alphabet indexing from biosequences. From only positive examples with the help of an alphabet indexing, the algorithm nds k regular patterns as a k-minimal multiple generalization (k-mmg for short). The computational results for transmembrane domains indicate that the combination of k-mmg and alphabet indexing...

متن کامل

Some Generalizations of Locally Closed Sets

Arenas et al. [1] introduced the notion of lambda-closed sets as a generalization of locally closed sets. In this paper, we introduce the notions of lambda-locally closed sets, Lambda_lambda-closed sets and lambda_g-closed sets and obtain some decompositions of closed sets and continuity in topological spaces.

متن کامل

A new characterization for Meir-Keeler condensing operators and its applications

Darbo's fixed point theorem and its generalizations play a crucial role in the existence of solutions in integral equations. Meir-Keeler condensing operators is a generalization of Darbo's fixed point theorem and most of other generalizations are a special case of this result. In recent years, some authors applied these generalizations to solve several special integral equations and some of the...

متن کامل

SOME GENERALIZATIONS OF WEAK CONVERGENCE RESULTS ON MULTIPLE CHANNEL QUEUES IN HEAVY TRAFFIC.

This paper extends certain results of Iglehart and Whitt on multiple channel queues to the case where the inter-arrival times and service times are not necessarily identically distributed. It is shown that the weak convergence results in this case are exactly the same as those obtained by Iglehart and Whitt

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005